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Abstract — This paper introduces a new framework of fast and 
efficient sensing matrices for practical compressive sensing, called 
Structurally Random Matrix (SRM). In the proposed framework, 
we pre-randomize a sensing signal by scrambling its samples or 
flipping its sample signs and then fast-transform the randomized 
samples and finally, subsample the transform coefficients as the 
final sensing measurements. SRM is highly relevant for large- 
scale, real-time compressive sensing applications as it has fast 
computation and supports block-based processing. In addition, 
we can show that SRM has theoretical sensing performance 
comparable with that of completely random sensing matrices. 
Numerical simulation results verify the validity of the theory as 
well as illustrate the promising potentials of the proposed sensing 
framework. 

Index Terms — compressed sensing, compressive sensing, ran- 
dom projection, sparse reconstruction, fast and efficient algo- 
rithm 



I. Introduction 

COMPressed sensing (CS) |[l], lEJ has attracted a lot of 
interests over the past few years as a revolutionary signal 
sampling paradigm. Suppose that x is a length- signal. It 
is said to be -sparse (or compressible) if x can be well 
approximated using only K <^ N coefficients under some 
linear transform: 

X = ^a, 

where $ is the sparsifying basis and a is the transform 
coefficient vector that has K (significant) nonzero entries. 

According to the CS theory, such a signal can be acquired 
through the following random linear projection: 

y = *x + e, 

where y is the sampled vector with M ^ N data points, $ 
represents a Af x random matrix and e is the acquisition 
noise. The CS framework is attractive as it implies that x 
can be faithfully recovered from only M = 0{K\ogN) 
measurements, suggesting the potential of significant cost 
reduction in digital data acquisition. 

While the sampling process is simply a random linear 
projection, the reconstruction to find the sparsest signal from 
the received measurements is highly non-linear process. More 
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precisely, the reconstruction algorithm is to solve the li- 
minimization of a transform coefficient vector: 

min||a||i s.t. y = <^'9a. 

Linear programming HI, ||2l and other convex optimization 
algorithms |3|, |4|, |5l have been proposed to solve the li 
minimization. Furthermore, there also exists a family of greedy 
pursuit algorithms |16|, Q, H, ID, HOl offering another 
promising option for sparse reconstruction. These algorithms 
all need to compute and {^'^)^ multiple times. Thus, 
computational complexity of the system depends on the struc- 
ture of sensing matrix $ and its transpose . 

Preferably, the sensing matrix $ should be highly incoherent 
with sparsifying basis ^, i.e. rows of $ do not have any 
sparse representation in the basis . Incoherence between two 
matrices is mathematically quantified by the mutual coherence 
coefficient ifTTl . 

Definition I.l. The mutual coherence of an orthonormal 
matrix N x N ^ and another orthonormal matrix N x N 
^ is defined as: 

^($,*)- max I I 

l<x,j<N 

where $i are rows of $ and are columns of respectively. 

If $ and ^ are two orthonormal matrices, ||$'i'j||2 = 
||'i'j||2 — 1. Thus, it is easy to see that for two orthonormal 
matrices $ and ^ , 1/\/N < fi < I. Incoherence impHes that 
the mutual coherence or the maximum magnitude of entries 
of the product matrix is relatively small. Two matrices 
are completely incoherent if their mutual coherence coefficient 
approaches the lower bound value of 1/\/N . 

A popular family of sensing matrices is a random projection 
or a random matrix of i.i.d random variables from a sub- 
Gaussian distribution such as Gaussian or Bernoulli |fT2ll . 
||13| . This family of sensing matrix is well-known as it is 
universally incoherent with all other sparsifying basis. For 
example, if $ is a random matrix of Gaussian i.i.d entries and 

is an arbitrary orthonormal sparsifying basis, the sensing 
matrix in the transform domain is also Gaussian i.i.d 
matrix. The universal property of a sensing matrix is important 
because it enables us to sense a signal directly in its original 
domain without significant loss of sensing efficiency and 
without any other prior knowledge. In addition, it can be 
shown that random projection approaches the optimal sensing 
performance of M = 0{K\ogN). 

However, it is quite costly to realize random matrices 
in practical sensing applications as they require very high 
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computational complexity and huge memory buffering due 
to their completely unstructured nature (14]. For example, 
to process a 512 x 512 image with GAK measurements (i.e., 
25% of the original sampling rate), a Bernoulli random matrix 
requires nearly gigabytes storage and giga-flop operations, 
which makes both the sampling and recovery processes very 
expensive and in many cases, unrealistic. 

Another class of sensing matrices is a uniformly random 
subset of rows of an orthonormal matrix in which the partial 
Fourier matrix (or the partial FFT) is a special case ifTSl . 
041. While the partial FFT is well known for having fast and 
efficient implementation, it only works well in the transform 
domain or in the case that the sparsifying basis is the identity 
matrix. More specifically, it is shown in [|14|, Theorem 1.1] 
that the minimal number of measurements required for exact 
recovery depends on the incoherence of $ and ^: 



(1) 



M = 0{tilK\ogN) 

where /i„ is the normalized mutual coherence: /i„ = \/N /i 
and 1 < /in < \/N- With many well-known sparsifying 
basis such as wavelets, this mutual coherence coefficient might 
be large and thus, resulting in performance loss. Another 
approach is to design a sensing matrix to be incoherent with a 
given sparsifying basis. For example, Noiselets is designed 
to be incoherent with the Haar wavelet basis in ifTSll , i.e. 
/!„ = 1 when $ is Noiselets transform and ^ is the Haar 
wavelet basis. Noiselets also has low-complexity implementa- 
tion 0{N \ogN) although it is unknown if noiselets is also 
incoherent with other bases. 

II. Compressive Sensing with Structurally 
Random Matrices 

A. Overview 

One of remaining challenges for CS in practice is to design 
a CS framework that has the following features: 

• Optimal or near optimal sensing performance: the num- 
ber of measurements for exact recovery approaches the 
minimal bound, i.e. on the order of 0{K\ogN)\ 

• Universality: sensing performance is equally good with 
almost all sparsifying bases; 

• Low complexity, fast computation and block-based pro- 
cessing support: these features of the sensing matrix are 
desired for large-scale, realtime sensing applications; 

• Hardware/Optics implementation friendliness: entries of 
the sensing matrix only take values in the set {0, 1, —1}. 

In this paper, we propose a framework that aims to satisfy 
the above wish-list, called Structurally Random Mafnx(SRM) 
that is defined as a product of three matrices: 



-DFR 



(2) 



where: 

• R E N X N is either a uniform random permutation ma- 
trix or a diagonal random matrix whose diagonal entries 
Rii are i.i.d Bernoulli random variables with identical 
distribution P{Rii = ±1) = 1/2. A uniformly random 
permutation matrix scrambles signal's sample locations 



globally while a diagonal matrix of Bernoulli random 
variables flips signal's sample signs locally. Hence, we 
often refer the former as the global randomizer and the 
latter as the local randomizer. 

• € X is an orthonormal matrix that,in practice, 
is selected to be fast computable such as popular fast 
transforms: FFT, DCT, WHT or their block diagonal 
versions. The purpose of the matrix F is to spread 
information (or energy) of the signal's samples over all 
measurements 

• Z) £ M X A^ is a subsampling matrix/operator. The oper- 
ator D selects a random subset of rows of the matrix FR. 
If the probability of selecting a row P(a row is selected) 
is M/N, the number of rows selected would be M in 
average. In matrix representation, D is simply a random 
subset of AI rows of the identity matrix of size N x N. 

The scale coefficient is to normalize the transform 
so that energy of the measurement vector is almost similar 
to that of the input signal vector 
Equivalently, the proposed sensing algorithm SRM contains 
3 steps: 

• Step 1 (Pre-randomize): Randomize a target signal by 
either flipping its sample signs or uniformly permuting 
its sample locations. This step corresponds to multiplying 
the signal with the matrix R 

• Step 2 (Transform): Apply a fast transform F to the 
randomized signal 

• Step 3 (Subsample): randomly pick up M measurements 
out of N transform coefficients. This step corresponds to 
multiplying the transform coefficients with the matrix D 

Conventional CS reconstruction algorithm is employed to 
recover the transform coefficient vector a by solving the li 
minimization: 



S = argminllalji s.t. y = ^^a. 



(3) 



Finally, the signal is recovered as x = ^a. The framework 
can achieve perfect reconstruction if x = x. 

From the best of our knowledge, the proposed sensing 
algorithm is distinct from currently existing methods such as 
random projection fW{, random filters 1171 . structured Toeplitz 
[18 1 and random convolution fl9\ via the first step of pre- 
randomization. Its main purpose is to scramble the structure 
of the signal, converting the sensing signal into a white noise- 
like one to achieve universally incoherent sensing. 

Depending on specific applications, SRM can offer com- 
putational benefits either at the sensing process or at the 
signal reconstruction process. For applications that allow us 
to perform sensing operation by computing the complete 
transform F, we can exploit the fast computation of the matrix 
F at the sensing side. However, if it is required to precompute 
DFR (and then store it in the memory for future sensing 
operation), there would not be any computational benefit at 
the sensing side. In this case, we can still exploit the structure 
of SRM to speed up the signal recovery at the reconstruction 
side as in most Zi-minimization algorithms |3|, majority of 
computational complexity is spent to compute matrix-vector 
multiplications Au and A^u, where A = Note that 
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both A and AJ" are fast computable if the sparsifying matrix 
W is fast computable, i.e. their computational complexity on 
the order of O(iVlogiV). In addition, when F is selected to 
be the Walsh-Hadamard matrix, the SRM entries only take 
values in the set {—1, 1}, which is friendly for hardware/optics 
implementation. 

The remaining of the paper is organized as follows. We 
first discuss about incoherence between SRMs and sparsifying 
transforms in Section |lll] More specifically. Section ITlI- Al will 
give us a rough intuition of why SRM has sensing perfor- 
mance comparable with Gaussian random matrices. Detail 
quantitative analysis of the incoherence for SRMs with the 
local randomizer and the global randomizer is presented in 
Section IIII-BI Based on these incoherence results, theoreti- 
cal performance of the proposed framework is analyzed in 
Section |IV] and then followed by experiment validation in 
Section FVl Finally, Section FVll concludes the paper with detail 
discussion of practical advantages of the proposed framework 
and relationship between the proposed framework and other 
related works. 



B. Notations 

We reserve a bold letter for a vector, a capital and bold 
letter for a matrix, a capital and bold letter with one sub-index 
for a row or a column of a matrix and a capital letter with 
two sub-indices for an entry of a matrix. We often employ 
X G for the input signal, y e M^^ for the measurement 
vector, $ e rA/xjv f^j. jj^g sensing matrix, * e M^'^^ for the 
sparsifying matrix and a E for the transform coefficient 
vector (x = ^a). We use the notation supp(z) to indicate the 
index set (or coordinate set) of nonzero entries of the vector 
z. Occasionally, we also use T to alternatively refer to this 
index set of nonzero entries (i.e., T=supp(2:)). In this case, zj- 
denotes the portion of vector z indexed by the set T and ^7- 
denotes the submatrix of ^ whose columns are indexed by the 

set r. 

Let A = FR and Sij, Fij be the entry at the i*'' row and 
the j*'' column of j4W and F, Rkk be the fc*'' entry on the 
diagonal of the diagonal matrix R, Ai and ^ j be the i*'' row 
of A and j*'' column of ^, respectively. 

In addition, we also employ the following notations: 

• Xn is on the order of o(z„), denoted as a;„ — o(z„), if 

lim ^ = 0. 

• Xn is on the order of 0{zn), denoted as Xn ~ 0{zn), if 

n->-oo Zn 

where c is some positive constant. 

• A random variable Xn is called asymptotically normally 
distributed 7V(0,a-^), if 



lim P(— < a;) = — ^ / e 



III. Incoherence Analysis 
A. Asymptotical Distribution Analysis 

If $ is an i.i.d Gaussian matrix JV{0, jj) and '9 is an 
arbitrarily orthonormal matrix, is also i.i.d Gaussian 
matrix J\f{0, j^), implying that with overwhelming probability, 
a Gaussian matrix is highly incoherent with all orthonormal $ . 
In other words, the i.i.d. Gaussian matrix is universally inco- 
herent with fixed transforms (with overwhelming probability). 
In this section, we will argue that under some mild conditions, 
with $ = DFR, where D,F,Raie defined as in the previous 
section, entries of are asymptotically normally distributed 
Af{0,a^), where < 0{-^). This claim is illustrated in 
Fig. [T] which depicts the quantile-quantile (QQ) plots of 
enti-ies of where N = 256, F is the 256 x 256 DCT 
matrix and W is the Daubechies-8 orthogonal wavelet basis. 
Fig- E^) and Fig. [Tfb) correspond to the case R is the local 
and global randomizer, respectively. In both cases, the QQ- 
plots appear straight, as the Gaussian model demands. 

Note that $ is a submatrix of A = FR. Thus, asymptotical 
distribution of the entries of A9 is similar to that of entries 
of **. 

Before presenting the asymptotical theoretical analysis, we 
introduce the following assumptions for the local and global 
randomization models. 

1 ) Assumptions for the Local Randomization Model: 

» F is an N X N unit-norm row matrix with absolute 
magnitude of all entries on the order of 0{^^). 

• '9 is an NxN unit-norm column matrix with the maximal 
absolute magnitude of entries on the order of o(l). 

2) Assumptions for the Global Randomization Model: 
The global randomization model requires similar assumptions 
for the local randomization model plus the following extra 
assumptions 

• The average sum of entries on each column of ^ is on 
the order of o(-^). 

• Sum of entries on each row of F is zero. 

• Entries on each row of F and on each column of ^ are 
not all equal. 

Theorem III.l. Let A — FR, where R is the local ran- 
domizer Given the assumptions for the local randomization 
model, entries of are asymptotically normally distributed 

Proof. With notations being defined in Section ITl-BI we have: 



N 



(4) 



Denote Zk = Fik'^ kjRkk- Because Rkk are i.i.d Bernoulli 
random variables, Zk are i.i.d zero-mean random variables 
with E{Zk) = 0. The assumption that \Fik\ are on the order 
of implies that there exist two positive constants ci 

and C2 such that: 



~dy. 



Cl 

N 



< War{Zk) 



N 



kj- 



(5) 
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(a) 




-0.25 -0.2 -0.15 -0.1 -0.05 0.05 0.1 0.15 0.2 
Normal Quantile N(0,1/N) 

(b) 

Fig. 1 

QQ PLOTS COMPARING DISTRIBUTION OF ENTRIES OF AND 

Gaussian distribution, (a) R is the local randomizer, (b) J? is 

THE global randomizer. THE PLOTS ALL APPEAR NEARLY LINEAR, 
indicating that entries of ft* ARE NEARLY NORMAL DISTRIBUTED 



The variance of Sij, cr^, can be bounded as the follows: 

N N N 

k=l k=l k=l 

(6) 

Because Sij is a sum of i.i.d zero-mean random variables 
{^k}k=i' according to the Central Limit Theorem (CLT)(see 
AppendixlB, 5"^^ Af{0,O{j^)). To apply CLT, we need to 
verify its convergence condition: for a given e > and there 
exists N that is sufficiently large such that the Var(Zfe) satisfy: 



Var(Zfc) < eCT^fc = 1,2, ...,7V. 



(7) 



To show that this convergence condition is met, we use the 
counterproof method. Assume there exists eq such that VA^, 
there exists at least ko e {1,2,..., N}: 



From (|5]l, (|6]) and ([8]l, we achieve: 

eo£l < VarfZfeJ < — 



(9) 



This inequality can not be true if "^kaj is on the order of 
o(l). The underlying intuition of the convergence condition is 
to guarantee that there is no random variable with dominant 
variance in the sum Sij. In this case, it simply requires that 
there is no dominant entry on each column of ^. □ 

Similarly, we can obtain a similar result when ii is a 
uniformly random permutation matrix. 

Theorem III.2. Let A = FR, where R is the global ran- 
domizer. Given the assumptions for the global randomization 
model, entries of are asymptotically normally distributed 

A/'(0,cr2), ^^jg^g ^2 < 

Proof. Let [wi, W2, wat] be a uniform random permuta- 
tion of [1, 2, A^]. Note that {(jJk}^^i can be viewed as a 
sequence of random variables with identical distribution. In 
particular, for a fixed k: 



P{^k 



Denote Zk — Fi^^.'^kj (we omit the dependence of Zk on i 
and i to simphfy the notation), we have: 



Si- 



N 

Fiujk ^kj 

fe=l 



N 



fe=i 



Using the assumption that the vector Fi has zero average sum 
and unit norm, we derive: 

N 

E{Zk) - *fc,S(F,„J ^ ^ Vf„- = 0. 



and also. 



N 



mi) = ^i,E{Fi:)^^Y.^l = ^ 



N 



In addition, note that although {i^k}^^i have the identical 
distribution, they are correlated random variables because of 
the uniformly random permutation without replacement. Thus, 
with a pair of k and I such that 1 < fc ^ Z < A^, we have: 



E{ZkZi) = 'fkj'fijE{F^^,F^^,) 



N(N-l) ^ 



F F 



N 



N 

p=i 



kj'i'ij 



N{N -1)' 



The last equation holds because the vector Fi has zero 
average sum and unit-norm. Then, we derive the expectation 
and the variance of Sa as follows: 



Var(ZfcJ > eocr^ 



(8) 



E{S,j) = 0; 
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N 



Var(^„ ) = ^(^^') + E ^(^'^^'^ 

l<k^q<N 



fc=l 



N 



7vE*^J N{N-1) E 



N N{N-1) 



N N 



fc=i 

TV 



fe=l 



< 



1 

N ^ 'n{N - 1) ' 



iV N{N - 1) ~ 'TV' 

The forth equations holds because the column has 
unit-norm. The theorem is then a simple corollary of the 
Combinatorial Central Limit Theorem |20| (see Appendix 1), 
provided that its convergence condition can be verified that is: 

,^^ maxi<fc<Af(fiifc - Fj)'^ maxi<fc<jv(^fcj - ^j)^ _ ^ 

(10) 



where 



_ 1 ^ _ 1 ^ 



fe=i 



Because F,; = 0, ||-Fi||2 = 1 and maxi<fc<Ar = C'l"^)- 
the equation ( fTOl l holds if the following equation holds: 



lim 



0. 



(11) 



Because {|^j|}jLi are on the order of o(-^): 



N 



_ (12) 
Also, due to l^'jl < maxi<fc<Ar I^Pfej I and \'i>kj\ are on the 
order of o(l): 

max - *7)^ < 4 max *L = o(l). (13) 

l<fc<A' •' J' - i^k<N 

Combination of ( fT2] ) and ( fTSl l implies (fTTT i and thus the 
convergence condition of the Combinatorial Central Limit 
Theorem is verified. □ 

The condition that each row of F has zero average sum 
is to guarantee that entries of F^ have zero mean while 
the condition that entries on each row of F and on each 
column of 'J' are not all equal is to prevent the degenerate 
case that entries of F'^ might become a deterministic quantity. 
For example, when entries of a row Fi are all equal 

Sij = ^kj7 which is a deterministic quantity, not 

a random variable. Note that these conditions are not needed 
when R is the local randomizer 

If is a DCT matrix, a (normalized) WHT matrix or a 
(normalized) DFT matrix, all the rows (except for the first 
one) have zero average sum due to the symmetry in these 



operation. When the input signal is zero-mean, this row might 
be chosen or not without affecting quality of the reconstructed 
signal. Otherwise, it should be included in the chosen row set 
to encode the signal's mean. Lastly, the condition that absolute 
average sum of every column of the sparsifying basis ^ are 
on the order of o{^^) is also close to the reality because the 
majority of columns of the sparsifying basis ^ can be roughly 
viewed as bandpass and highpass filters whose average sum 
of the coefficients are always zero. For example, if W is a 
wavelet basis (with at least one vanishing moment), then all 
columns of (except one at DC) has column sum of zero. 

The aforementioned theorems show that under certain con- 
ditions, the majority of entries of (also behave like 
Gaussian random variables A/'(0, cr^), where < 0{jj). 
Roughly speaking, this behavior constitutes to a good sensing 
performance for the proposed framework. However, these 
asymptotic results are not sufficient for establishing sensing 
performance analysis because in general, entries of A'S' are not 
stochastically independent, violating a condition of a sensing 
Gaussian i.i.d matrix. In fact, the sensing performance might 
be quantitatively analyzed by employing a powerful analysis 
framework of a random subset of rows of an orthonormal 
matrix (14]. Note that A is also an orthonormal matrix when 
R is the local or the global randomizer 

Based on the Gaussian tail probabiUty and a union bound 
for the maximum absolute value of a random sequence, the 
maximum absolute magnitude of A'^ can be asymptotically 
bounded as follows: 

P( max IS",,! > t) -< 2N'^exp( ^) 

i<i,j<N ' 2a^ 

where cr"^ < and c is some positive constant and < stands 
for "asymptotically smaller or equal", i.e., when N goes to 
infinity, < becomes <. 

If we choose t = 
equivalent to: 



matrices. The first row, whose entries are all equal can 
be considered as the averaging row, or a lowpass filtering 



N 



, the above inequality is 



P( max \S^A < 

l<t,j<N 



clog2{N/Sy 



N 



which implies that with probability at least 1 ~ S, the mutual 

coherence of A and ^ is upper bounded by Oi^^J^^^^^), 
which is close to the optimal bound, except the log N factor 



In the following section, we will employ a more powerful 
tool from the theory of concentration inequalities to analyze 
the coherence between A = FR and ^ when N is finite. We 
also consider a more general case that is a sparse matrix 
(e.g. a block-diagonal matrix). 

B. Incoherence Analysis 

Before presenting theoretical results for incoherence analy- 
sis, we introduce assumptions for block-based local and global 
randomization models. 

1 ) Assumptions for the Block-based Local Randomization 
Model: 

• F is an X unit-norm row matrix with the maximal 



absolute magnitude of entries on the order of O*^ ^ 
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where 1 < B < N, i.e. niaxi<i,j<Ar \Fij \ — where 
c is some positive constant. 

• 4' is an X unit-norm column matrix. 

2) Assumptions for the Block-based Global Randomization 
Model: The block-based global randomization model requires 
similar assumptions for the block-based local randomization 
model plus the following assumption: 

• All rows of F have zero average sum. 

Theorem III.3. Let A = FR, where R is the local ran- 
domizer Given the assumptions for the block-based local 
randomization model, then 

• With probability at least 1 — 5, the mutual coherence of 

A and "9 is upper bounded by 0{\j'^^^^). 

• In addition, if the maximal absolute magnitude of entries 
of^ is on the order of 0{^j^), the mutual coherence is 

upper bounded by 0{\J'^^^^^), which is independent 
ofB. 

Proof. A common proof strategy for this theorem as well 
as for other theorems in this paper is to establish a large 
deviation inequality that implies the quantity of our interest is 
concentrated around its expected value with high probability. 
Proof steps include: 

• Showing that the quantity of our interest is a sum of 
independent random variables; 

• Bounding the expectation and variance of the quantity; 

• Applying a relevant concentration inequality of a sum of 
random variables; 

• Applying a union bound for the maximum absolute value 
of a random sequence. 

In this case, the quantity of interest is: 

fcgsupp(Fi) 

Denote — Fik'i'kjRkk, for k E supp(Fi) (in the support 
set of the row Fi). Because Rkk are i.i.d Bernoulli random 
variables, are also i.i.d random variables with E{Zk) = 0. 
Zkk are also bounded because Zk = ±Fik'^kj 

Sij is a sum of independent, bounded random variables. 
Applying the Hoeffding's inequality (see Appendix 2) yields: 



Pr(|S'jj| >t)< 2exp(- 



Z^/cGsupp(/i) ik jk 



)■ 



The next step is to evaluate = Ekesupp{f,) ^l^jfc- Here, 
can be roughly viewed as the approximation of the variance 

of Sy . 



cr^ < max V *?.,, < max IF^A'^ = — 

- l<i,j<N' ^' ^ - 1<i.]<n' ^' B 

fcesupp(Fi) 

(14) 

If the maximal absolute magnitude of entries of is on the 
order of 0{^): 

max 1*1, h 



where c is some positive constant, then 

1 

l<i,j<N 



a'^ < max |*yf V i^^ < j^^x \^,f^^^ 



l<k<N 



(15) 

Finally, we derive an upper bound of the mutual coherence 
/i ~ maxi<ij<jv \Sij\ by taking a union bound for the 
maximum absolute value of a random sequence: 



l<i,j<A' 



P{ max \S^j\ >t)< 2N^exp{-!-). 



Choose t = y/a^ \og{2N'^ /S), after simplifying the inequality, 
we get: 



P{ max < ^0-2 \og{2N'^/S)) > 1 - 5. 

l<iJ<N 

Thus, with an arbitrarily 'S', (fT4b holds and we achieve the 
first claim of the Theorem: 



In the case that (fTTt holds, we achieve the second claim of 
the Theorem: 



□ 



N 



Remark III. I. When A is some popular transform such as the 
DCT or the normalized WHT, the maximal absolute magnitude 
of entries is on the order of As a result, the mutual 

cohe rence of A and an arbitrary ^ is upper bounded by 

^{\f^^^^^)y which is also consistent with our asymptotic 
analysis above. In other words, when at least $ or 'i' is a dense 
and uniform matrix, i.e. the maximal absolute magnitude of 
their entries is on the order of 0{^^), their mutual coherence 
approaches the minimal bound, except for the log N factor In 
general, the mutual coherence between an arbitrary ^ and a 
sparse matrix A (e.g. block diagonal matrix of block size E) 

might be times larger. 

Cumulative coherence is another way to quantify incoher- 
ence between two matrices ll2n . 

Definition III.l. The cumulative coherence of an x iV 
matrix A and an x i^T matrix B is defined as: 



l^c{A,B) 



where Ai and Bj are rows of A and columns of B, respec- 
tively. 

The cumulative coherence fic{A,B) measures the average 
incoherence between two matrices A and B while mutual 
coherence iJ.{A,B) measures the entry-wise incoherence. As a 
result, the cumulative coherence seems to be a better indicator 
of average sensing performance. In many cases, we are only 
interested in cumulative coherence between A and ^t, where 
T is the support of the transform coefficient vector. As will 
be shown in the following section, the cumulative coherence 
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provides a more powerful tool to obtain a tighter bound for 
the number of measurements required for exact recovery. 

From the definition of cumulative coherence, it is easy to 
verify that fic < VKfi- If we directly apply the result of the 
Theorem IIII.3I we obtain a trivial bound of the cumulative 



coherence: /ic — ^^'^^ ) for an arbitrary basis ^ and 

' KloKN^ 



N 



for a dense and uniform In fact, we 



can get rid of the factor logA^ by directly measuring the 
cumulative coherence from its definition. 

Theorem III.4. Let A — FR, where R is the local ran- 
domizer. Given the assumptions for the block-based local 
randomization model, with probability at least 1 — 5, the 
cumulative coherence of A and ^7-, where |T| — K, is upper 
bounded by -2| max(\/Z, 4^1og(27V/(5)). 

Proof. Denote U = and Uk are columns of U . Let Ai and 
^3 {j £ T) be rows of A and columns of '^'7-, respectively. 




i^r\\2 



RkkFikUk\\2- 

feesupp(Fi) 



Denote Vk — FikUk and V is the matrix of columns Vk, 
k e supp(i^i). First, we derive upper bound for the Frobenius 
norm of V: 



\V\\l<^rn^^^^F^^\\U\\j,= 



l<i,j<N 



c'K 



The last equation holds because ||i7|||. = K. Also, the 
bound for the spectral norm is: 



V\\l= sup I^-^'^* 



fc£supp(P'i) 



K 



sup Frl^CEPoUk,? 
fcG,supp(F.) i=l 

IIPll^-l l<k<N 



<^\\u\\i = ^. 



B 



The last equation holds because \\U\W = 1. Now, we have: 

Si — \\ ^ RkkFikUk\\2 — \\ ^ RkkVk\\2- 

fcesupp(Fi) feesupp(Fi) 

Let us denote Z = J2ke.upp{F,) RkkVk- 

Z is a Rademacher sum of vectors and Si — \\Z\\2 is 
a random variable. To show that Si is concentrated around 
its expectation, we first derive bound of £'(||Z||2). It is easy 
to verify that for a random variable X, E{X) < yjE{X'^). 
Thus, we will derive the upper bound for the simpler quantity 
E{\\Z\\l) 

E{\\Z\\l) - E{Z*Z) = Y E{RkkRii){Vk,Vi) 
fc,iesupp(Fi) 

= E {Vk,Vk)^\\v\r 



B 



The third equality holds because Rkk are i.i.d Bernoulli 
random variables and thus, E{RkkRii) = Vfc 7^ /. As a 
result, 

E{S,) = E{\\Z\\2) < c^^ 

Applying Ledoux's concentration inequality of the norm of 
a Rademacher sum of vectors [22] (see Appendix 2). Noting 
that \\V\W can be viewed as the variance of Si, yields: 

Pr(5» > c-y/^ +i) < 2exp(-<2-^) 

Finally, apply a union bound for the maximum absolute 
value of a random process,we obtain: 

Pk b 

Pr( max S^ >c\—+t)< 2N exp{-f-—r). 



l<i<N 



16c^ 



Choose t = -^y^log{2N/S), we get: 



Pr( max 5, > -^(V^ + AJlog(2N / S))) < 6. 

l<i<N y/B 

Finally, we derive: 

2c 



Pr( max 5"^ > —= max(VX, 4:J\og(2N/6))) < S. 



is 0{\l ' ' ), which is similar to that of the mutual 



□ 

Remark 111.2. When K > 16 log(2A''/5), the cumulative 
coherence is upper bounded by ©(^f )■ When K < 
161og(2A^/5), the upper bound of the cumulative coherence 

B 

coherence in Theorem IIII.3I 

Remark III. 3. When F is some popular transform such as 
the DCT or the normalized WHT, the maximum absolute 
magnitude of entries is on the order of As a result, 

the cumulative coherence of A and any arbitrary '^'7-, where 
in = K, is upper bounded by 0{.J§) if K > 161og(2|^). 

Remark 111.4. The above theorem represents the worst-case 
analysis because ^ can be an arbitrary matrix (the worst case 
corresponds to the case when '9 is the identity matrix). When 
^ is known to be dense and uniform, the upper bound of 
cumulative coherence, accordi ng to the Theorem IIII. 31 and the 

fact that He < li\fK, is OU ^^'^^^ ), which is, in general. 



better than 0{y 

The asymptotical distribution analysis in Section IIII-AI 
reveals a significant technical difference required for two 
randomization models. With the local randomizer, entries of 
A^ are sums of independent random variables while with 
the global randomizer, they are sums of dependent random 
variables. Stochastic dependence among random variables 
makes it much harder to set up similar arguments of their 
sum's concentration. In this case, we will show that the 
incoherence of A and ^ might depend on an extra quantity, 
the heterogeneity coefficient of the matrix 
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Definition III.2. Assume 'S' is an TV x matrix. Let 71- be 
the support of the column Define: 

maxi<i<Ar 



Pk 



Ti.l Si 



\Tk \ Tfc *L 



(16) 



The column-wise heterogeneity coefficient of the matrix W is 
defined as: 

p$ = max Pk- (17) 

l<fe<7V 

Obviously, 1 < Pfe < ^|7fc|. Pk illustrates the difference 
between the largest entry's magnitude and the average energy 
of nonzero entries. Roughly speaking, it indicates heterogene- 
ity of nonzero entries of the vector 'Sfc. If nonzero entries of a 
column '^k are homogeneous, i.e. they are on the same order 
of magnitude, pk is on the order of a constant. If all nonzero 
entries of a matrix are homogeneous, the heterogeneity coef- 
ficient is also on the order of a constant, =0(1) and ^ 
is referred as a uniform matrix. Note that a uniform matrix is 
not necessarily dense, for example, a block-diagonal matrix of 
DCT or WHT blocks 

The following theorem indicates that when the global ran- 
domizer is employed, the mu tual cohe rence between A and 'J' 

is upper-bounded by 0{piSl^J ^^^g^^^ ), where B is the block 
size of $ and is an arbitrarily matrix with the heterogeneity 
coefficient p*. 

Theorem III.5. Let A ~ FR, where R is the global random- 
izer Assume that pk > 41og(2A^2/^) ^ {1, 2, . . . , iV}, 
where pk is defined as in ( I-/6I ). Given the assumptions for the 
block-based global randomization model, then 

• With probability at least 1 — 5, the mutual coherence of 

A and ^ is upper-bounded by 0{p-qf 'ij'^^^^^^), where 
pyj, is defined as in Ill7\l 

• In addition, if^ is dense and uniform, i.e. the maximum 
absolute magnitude of its entries is on the order of 
0{^^) and B > 4 log(27V^/(5), the mutual coherence is 

upper-bounded by 0{\l ISS ij^/^'f y which is independent 
ofB. 

Proof. Let [uji,uj2, . ■ . ,i^n] be a uniformly random permuta- 
tion of [1,2, ...,N]. 



N 



Jk- 



k=l 



As in the proof of the Theorem IIII.2I {wfcj^i can be 
viewed as a sequence of dependent random variables with 
identical distribution, i.e. for a fixed k e {1, 2, . . . , N}: 

PK = z) = l ie{i,2,...,N}. 

The condition of F is equivalent to maxi<i j<7v \Fij\ ~ 
where c is some positive constant. Define {qkujk}^=i 
the follows: 



' lo if*,fe=0. 



It is easy to verify that < qkLj^ < 1. Define Wk as the 
sum of dependent random variables 5^^^^ 



N 



Wk = j2 



k=l 



AT 



k=l 



^ VBm o _^\Tk[ 

' 2cp* 2 • 

Note that {-Fi^fcl^i are zero-mean random variables be- 
cause Fi has zero average sum. Thus, E{Sij) = and 
E{Wk) = Then, applying the Sourav's theorem of con- 
centration inequality for a sum of dependent random variables 
li23J (see Appendix 2) results in: 



2c pq. 



S^j\ >e}< 2exp(- 



2\Tk\+2e 



Denote t — — 2££S=e. The above inequality is equivalent to: 

V B\Tk\ 

P{|%| >t}< 2eMS ^ 



^c'pl2\Tk\ + ±VBm 



By choosing t = Acp^J -g log(^^), we achieve: 



^'{|%|>i}<2exp(- 



-4|r.|iog(^) 



2|r.| + 4Vir.|iog(2fi) 

If |7fe| > 41og(^^), the denominator inside the exponent 
is smaller than 4|7fc|. Thus, 



1 2A^2 2N^ S 

P{\S^j\ > 2cp*y-log( — )} < 2exp(-log( — )) = — . 

Finally, after taking the union bound for the maximum 
absolute value of a random sequence and simplifying the 
inequality, we obtain the first claim of the Theorem: 



P{ max <0(p^./i^M^)}> 1-5. 

l<i,j<N •' V B 

If ^ is known to be dense and uniform, i.e. 

maxK,; j<7v l^'ijl = where ci is some positive 

constant. We then define {qkujk}k=i '■^^ following: 



/BN 



Qku 







if F,,k = 0. 



Note that < qk^^ < 1 and Eiqku,^] 
same arguments above, we have: 

NB 

P{\S,,\>t}<2cM--^ 



4c2c2 2B + -^VnB 




Similarly, choose t = icciJ jt log(^^), we can derive: 



-P{|5y|>0<2exp(- 



-4Blog(^) 



2B + 4^/B\og{^) 
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If i? > 41og(^^), the denominator inside the exponent is 
smaller than AB. Thus, 



1 2/V2 f) 
P{|5,|>2cciy-log( — )}<^. 

After taking the union bound of the maximum absolute 
value of a random sequence, we achieve the second claim 
of the Theorem. □ 

Remark 111.5. The first part of theorem implies that when 
F is a dense and uniform matrix (e.g. DCT or normalized 
WHT) and W is a uniform matrix (not necessarily dense), 
the mutual co herence closely approaches the minimum bound 
Q^^^J^si^/sjy Although in this theorem, the mutual coher- 
ence depends on the heterogeneity coefficient, one will see 
in the experimental Section |V] that this dependence is almost 
negligible in practice. 

As a consequence of this theorem, when at least A or 9 
is dense and uniform, the mutual coherence of A and 9 is 
roughly on the order of O ( 



'°^^ ), which is quite close to the 



N 

minimal bound except for the logA^ factor Otherwise, 
the coherence linearly depe nds o n the block size B of F 
and is on the order of 0{^J^^^-). As a matter of fact, this 
bound is almost optimal because when 9 is the identity matrix, 
the mutual coherence is actually equal the maximum absolute 
magnitude of entries of A, which is on the order of 0{^^). 

Remark 111.6. Although the theoretical results of the global 
randomizer seem to be always weaker than those of the local 
randomizer, there are a few practical motivations to study 
this global randomizer. Speech scrambling has been used for 
a long time for secure voice communication. Also, analog 
image/video scrambling have been implemented for commer- 
cial security related applications such as CCTV surveillance 
system. In addition, permutation does not change the dynamic 
range of the sensing signal, i.e. no bit expansion in implemen- 
tation. The computation cost of random permutation is only 
0{N), which is very easy to implement in software. From 
a security perspective the operation of random permutation 
offers a large key space than random sign flipping (A^! vs 
2^). Also, as will be shown in the numerical experiment 
section, with random permutation, one can get highly sparse 
measurement matrix. 



IV. Compressive Sampling Performance Analysis 

Section demonstrates that under some mild conditions, 
the matrix A and ^ are highly incoherent, implying that 
the matrix A'9 is almost dense. When A'9 is dense, energy 
of nonzero transform coefficients ay is distributed over all 
measurements. Commonly speaking, this is good for signal 
recovery from a small subset of measurements because if 
energy of some transform coefficients were concentrated in 
few measurements that happens to be bypassed in the sampling 
process, there is no hope for exact signal recovery even when 
employing the most sophisticated reconstruction method. This 
section shows that a random subset of rows of the matrix 
A — FR yields almost optimal measurement matrix $ for 
compressive sensing. 



A. Assumptions for Performance Analysis 

A signal x is assumed to be sparse in some sparsifying basis 
^: X = ^a, where the vector of transform coefficients a has 
no more than K nonzero entries. The sign sequence of nonzero 
transform coefficients ax which is denoted as z, is assumed 
to be a random vector of i.i.d Bernoulli random variables (i.e. 
P{zi = ±1) = i). Let y — $x be the measurement vector, 

where * = ^J^DFR is a Structurally Random Matrix. 
Assumptions of the block-based local randomization and of 
the block-based global randomization models hold. 

B. Theoretical Results 

Theorem IV.l. With probability at least 1 — S, the proposed 
sensing framework can recover K-sparse signals exactly if 
the number of measurements M > O(^iflog^(-j)). If F is 
a dense and uniform rather than block-diagonal( e.g. DCT or 
normalized WHT matrix), the number of measurement needed 
is on the order of 0{K\og^{f)). 

Proof. This is a simple corollary of the theorem of Candes 
et. al. [lHU Theorem 1.1] ([B because (i) A = FR is an 
orthonormal matrix, and (ii) our incoherence results between 
A and 9 in the Theorem HIO and Theorem HIOl □ 

Remark IV.l. If "9 is dense and uniform, the number of 
measurements for exact recovery is always 0{K log'^{^)) 
regardless of the block size B. This implies that we can use the 
identity matrix for the transform F {B = 1). For example, when 
the input signal is known to be spectrally sparse, compressively 
sampling it in the time domain is as efficient as in any other 
transform domain. 

Compared with the framework that uses random projection, 
there is an upscale factor of log N for the number of measure- 
ments for exact recovery. In fact, by employing the bound of 
cumulative coherence, we can eliminate this upscale factor and 
thus, successfully showing optimal performance guarantee. 

Theorem IV.2. Assume that the sparsity K > 161og(2^). 
With probability at least 1—S, the proposed framework employ- 
ing the local randomizer can reconstruct K-sparse signals ex- 
actly if the number of measurements M > 0{^K \og{^)). If 
F is a dense and uniform matrix (e.g. DCT or normalized 
WHT), the minimal number of required measurements is M = 
0{K\og{^)). 

Proof. The proof is based on the result of cumulative coher- 
ence in the Theorem 1111.41 and a modification of the proof 
framework of the compressed sensing 1741 . 

Denote U = .f^FR^, Ur = 



^FR^r, Un = 



§jDFR^ and Unr = \J Td^FR^r^ where the support 

VL = {k\Dkk = 1, fc = 1, 2, .., N}. Let Vk, k e {1, 2, N}, 
be columns of U'ij-. Denote /ic ~ maxi<fe<Ar ||t;i:||2, where 

— l^ciA, 'i'7-) is the cumulative coherence of A 
and ' ^7-. According to the above incoherence analysis, /i, 
0( 



§FR 

, < 



-g^). Also, denote p as the mutual coherence of A and 



jVlog jV 
BM 
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As indicated in lfT2l . lfT4l . to show li minimization exact 
recovery, it is sufficient to verify the Exact Recovery Principle. 

Exact Recovery Principle. With high probability, \nk\ < 1 
for all k e T*^, where is the complementary set of the 
set T and w — U*^Uiii{JJ*Q^j-UQ,']-)~^z, where z is the sign 
vector of nonzero transform coefficients a-r- 

Note that Trj. — {uk{U'^j-Uar)~^ ^ where i/j. is the k*^ 
row of UyUnT^ for some k e T*^. To estabHsh the Exact 
Recovery Principle, we will first derive following lemmas. The 
first lemma is to bound the norm of Vk- 

Lemma IV.l. (Bound the norm ofv^} With high probability, 
is on the order of 0{fic)- 

P{hk\\ > fic + aa) < 3exp{~ja'^), 

where a, 7 and a are some certain numbers. 

Proof. Let U k be columns of U . For k £ T'^: 



In addition, it is obvious that \Zk\ < 1 and thus 
B = max \\hi\\2 < UUc- 

l<i<N 

The Talagrand's theorem ||24 l (see Appendix 2) shows that: 



Bt 



P(K||-i^(K||) > t) < 3exp(-log(l+-,^^^^ll^^ll^ 

where c is some positive constant. Replacing and 
B by their upper bounds in the right-hand side, we obtain: 



)), 



P{\\uk\\ - E{\\uk\\) >t)< 3exp( log(l 



CfJ,flc 



:))• 



1 >^ 

1=1 



1=1 



where the second equality holds because X]j=i UikVi = 
U^Uk = that results from the orthogonality of columns 
of U. Let Zi = (Da — ^). Because Da are i.i.d binary 
random variables with P^JDu = 1) = Zi are zero mean 
i.i.d random variables and E{Zf) = ^(1 - ^). Let if be the 
matrix of columns hi = Ui^Vi, i S {1, 2, . . . , iV} . Then, i/fc 
can be viewed as a random weighted sum of column vectors 
hi- 

N 



The next step is to simplify the right-hand side of the above 
inequality by replacing the denominator inside the log by two 
times the dominant term and note that ^05(1 + 2:) > f when 
X < \. In particular, there are two cases: 

• Case 1: /i/i^ > jf^ or equivalently, /i^ > /i, denote ct^ = 
/i/i^ and t = aa . If ^^ct < 2/i/i^ or equivalently, a < 

2(i/M)i 

P(||i/fe|| - E{\\vk\\) >t)< 3exp(-7a2). 

• Case 2: /i^ > /i/i^, denote ct^ = fi^ and t = aa. If 
fifict < or equivalently, a < 

- E{\\tyk\\) >t)< 3exp(-7a2). 

where 7 is some positive constant. 



In conclusion, let a = ■\/max(/i/i2, fi'^). Then, for any a < 
mm{2/ fic, 2 /y/JI): 

-Pdl^fell > Mc + act) < 3exp(-7a^), 



where 7 is some positive constant. 



(18) 
□ 



and ||t/fc|| is a random variable. We have: 

EiWi^kf)^ EiZ,Z,){W„h,) ^ J2 EizDWW.WWh^Unr 



E 

l<i,j<N 



The second lemma is to bound the spectral norm of 



l<i<N 



where the last equality holds due to E{ZiZj) ~ if i ^ j. 
Thus, 



M , 



E{\\v,r) = -{!-- 



N 
M 



N' 



l<i<N 



l<i<N 



Lemma IV.2. (Bound the spectral norm of U^j^UnT) 
With high probability, WU^j-UutW ^ 5 

Proof. The Theorem 1.2 in lfT4l shows that with 
probabiHty 1 - S, \\U*^^-j-Unr\\ > 5 if A/ > 
/.t^ max(ci log_ft', C2 log(3/(5)), where ci and C2 are some 
known positive constants. 



□ 



where the last inequality holds due to ||i7fe|P = tt. This 



Af ■ 



implies that Pdli/fcH) < /Xj,. To show that is concen- 

trated around its mean, we use the Talagrand's theorem of 
concentration inequality ll24l . First, we have: 



And the third lemma is to bound the norm of Wk 



N 



\H\\l= sup V|(^,/i,)|2= sup 

\m=i~t m\= 



N 



N 



< fi sup / 

11^11=1 ,=1 



N 



where the last equation holds because HI/tIII ^ 17- Thus, we 
derive the upper bound of the variance a 



2. 



o^^E{Zl)\\H\\l<^{l 



M,N 2 2 
— — M < M • 



Lemma IV.3. (Bound the norm ofwk = fkiU^r^nr)^^) 
With high probability, \\wk\\ is on the order of 0{fic)' 

P{sup \\wk\\ > 2fi,+2aa) < 3N exp{~-fa^)+P{\\U*nrUnT\\ < h 
keT" ^ 

(19) 

where a, 7 and a are defined in the proof of the Lemma \IV.1\ 

Proof Let A be the event that {\\U}^j-Unr\\ > ^} or 
equivalently, {\\{U^j-Unryi\\ < 2} and B be the event that 
{supj,g-7-c W^kW ^ IJ-C + ao"}. Note that 

sup llwfcll < \\{U*^-T-Unry^\\ sup 

keT" keT" 
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Thus, 



P(sup IliUfell > 2^c + 2aCT) < P{AnB) < P{A) + PiB). 

keT" 



Note that P{B) < 3N exp{~-/a'^) implies (O holds. 



To establish the Exact Recovery Principle, we will show 
that supj.g7-c \{wk,z)\ < 1 with high probabiHty. Note that 
because z is assumed to be a vector of i.i.d Bernoulli random 
variables, |(t0A;,2)| is concentrated around its zero mean. In 
particular, according to the Hoeffding's inequality: 

P(|(«'fc,^)|>l)<2exp(--^). 

=^ P{\{wk,z)\ > 1| sup \\wk\\ < A) < 2Nexpi~^). 

Note that with two arbitrary probabilistic events A and B\ 

P{A) - P{A\B)P{B) + P(yl|S)P(6) < P{A\B) + P{B). 

Now, let A be the event {supj.g^c |(iyfe,2)| > 1} and B be 
the event {supj,g-7-c ||wfc|| < A}, we derive 

P(sup |(wfe,z)| > 1) < 27Vexp(--i2)+i'(sup \\wk\\ > A). 
fcer'= keT" 

(20) 

Choose A = 2^c + 2aa, according to (fT9] l and ( |20| |. the 
probability of our interest P(supj,g7-c |(wfc,z)| > 1) is upper 
bounded by: 

3iVexp(-7a2) + 27Vexp(-^) + 5. 

To show that {sup^.g-^c |(i(;/c,z)| < 1} with probability 1 — 
0{5), it is sufficient to show that the above upper bound is 
not greater than 3(5. In particular, choose — 7^^ \og{'iN/5) 
that makes the first term to be equal 5. 

To make the second term less than 5, it is required that 



1 , .2iV, 
2A^ ^ 



(21) 



Case 1: /Lt^ > /i. The condition that (fTST i holds is a < 
2(l//x)2 that is equivalent to: 

1 



l>-7-Vlog'(3iV/'5). 

It is easy to see > a^, where a — (/i/i^)^/^. In this 
case, A < 4/ic. Thus, (I2TI ) holds if 



2iV 

l>32//2log( ) 



(22) 



Case 2: /i > /i^. The condition that (fTSl l holds is a < 
2/ or equivalently, 

1 > \l-^^il\og{ZN/5). 

If /ic > OCT, where = n, X < Afic and the condition is 
again (l22b . Otherwise, A < 4aCT. In this case, ( 1211 1 holds 
if 

2N 

1 > 327-V'log(^). 



In conclusion, the Exact Recovery Principle is verified if 

1 > ma.x{ci^'^log'^{3N/S),C2iil\og{3N/S)), where a and 
C2 are known positive constants. 



Finally, note that < 0{- 



BM, 



-) and fii < 0{ 



BM 



and 

the assumption that K > 161og(^), the sufficient condition 
L-l for exact recovery is M > 0{^K \og{f)). When F is dense 



and uniform, the condition becomes M > 0{K log{^)) 



□ 



V. Numerical Experiments 

A. Simulation with Sparse Signals 

In this section, we evaluate the sensing performance of 
several structurally random matrices and compare it with that 
of the completely random projection. We also explore the 
connection among sensing performance (probability of exact 
recovery), streaming capacity (block size of F) and structure 
of the sparsifying basis "9 (e.g. sparsity and heterogeneity). 

In the first simulation, the input signal x of length TV ~ 
256 is sparse in the DCT domain, i.e. x = '^a, where 
the sparsifying basis '9 is the 256 x 256 IDCT matrix. Its 
transform coefficient vector a has K nonzero entries whose 
magnitudes are Gaussian distributed and locations are at 
uniformly random, where K e {10,20,30,40,50,60}. With 
the signal x, we generate a measurement vector of length 
M — 128: y = ^x, where $ is some structurally random 
matrix or a completely Gaussian random matrix. SRMs under 
consideration are summarized in Table U 

The software Zi -magic 01 is employed to recover the signal 
from its measurements y. For each value of sparsity K E 
{10,20,30,40,50,60}, we repeat the experiment 500 times 
and count the probability of exact recovery. The performance 
curve is plotted in Fig. |2|a). Numerical values on the x-axis 
denote signal sparsity K while those on the y-axis denote 
the probability of exact recovery. We then repeat similar 
experiments when an input signal is sparse in some sparse 
and non-uniform basis 9. Fig. |2|b) and Fig. |2|c) illustrate 
the performance curves when "9 is the Daubechies-8 wavelet 
basis and the identity matrix, respectively. 

There are a few notable observations from these experi- 
mental results. First, performance of the SRM with the dense 
transform matrix F (all of its entries are non-zero) is in 
average comparable to that of the completely random matrix. 
Second, performance of the SRM with the sparse transform 
matrix F, however, depends on the sparsifying basis 4' of 
the signal. In particular, if * is dense, the SRM with sparse F 
also has average performance comparable with the completely 
random matrix. If ^ is sparse, the SRM with sparse F often 
has worse performance the SRM with dense F, revealing a 
trade-off between sensing performance and streaming capacity. 
These numerical results are consistent with the theoretical 
analysis above. In addition. Fig. |2tb) shows that the SRM 
with the global randomizer seems to work much better than 
the SRM with the local randomizer when the sparsifying basis 
^ of the signal is sparse. 

B. Simulation with Compressible Signals 

In this simulation, signals of interest are natural images of 
size 512 X 512 such as the 512x512 Lena, Barbara and Boat 
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TABLE I 

SRMS EMPLOYED IN THE EXPERIMENT WITH SPARSE SIGNALS 



Notation 


R 


F 


WHT64-L 


Local randomizer 


64 X 64 block diagonal WHT 


WHT64-G 


Global randomizer 


64 X 64 block diagonal WHT 


WHT256-L 


Local randomizer 


256 X 256 block diagonal WHT 


WHT256-G 


Global randomizer 


256 X 256 block diagonal WHT 



images. The sparsifying basis ^ used for these natural images 
is the well-known Daubechies 9/7 wavelet transform. All 
images are implicitly regarded as 1-D signals of length 512^. 
The GPSR software in |3| is used for signal reconstruction. 

For such a large scale simulation, it takes a huge amount 
of system resources to implement the sensing method of a 
completely random matrix. Thus, for the purpose of bench- 
mark, we adopt a more practical scheme of partial FFT in the 
wavelet domain (WPFFT). The WPFFT is to sense wavelet 
coefficients in the wavelet domain using the method of partial 
FFT. Theoretically, WPFFT has optimal performance as the 
Fourier matrix is completely incoherent with the identity 
matrix. The WPFFT is a method of sensing a signal in the 
transform domain that also requires substantial amount of 
system resources. SRMs under consideration are summarized 
in Table M 

For the purpose of comparison, we also implement two 
popular sensing methods: partial FFT in the time domain 
(PFFT)IJU and the Scrambled/Permutted FFT (SFFT) in L25l . 
Il26l that is equivalent to the dense SRM using the global 
randomizer 

The performance curves of these sensing ensembles are 
plotted in Fig. 13 a). Fig. |3jb) and Fig. Oc), which correspond 
to the input signal Lena, Barbara and Boat images, respec- 
tively. Numerical value on the a;-axis represents sampling rate, 
which is the number of measurements over the total number 
of samples. Value on y-axis is the quality of reconstruction 
(PSNR in dB). Lastly, Fig. |4] shows the visually reconstructed 
512 X 512 Boat image from 35% of measurements using 
WPFFT, WHT32-G and WHT512-L ensembles. 

As clearly seen in Fig. 3, the PFFT is not an efficient sensing 
matrix for smooth signals like images because Fourier matrix 
and wavelet basis are highly coherent. On the other hand, 
the SRM method, which can roughly be viewed as the PFFT 
preceded by the pre-randomization process, is very efficient. 
In particular, with a dense SRM like SFFT, the performance 
difference between the SRM method and the benchmark 
one, WPFFT, is less than 1 dB. In addition, performance of 
DCT512-L and WHT512-L that are fully streaming capable 
SRM, degrades about 1.5 dB, which is a reasonable sacrifice 
as the buffer size required is less than 0.2 percent of the total 
length of the original signal. Less degradation is obtainable 
when the buffer size is increased. Also, in all cases, there is 
no observable difference of performance between DCT and 
normalized WHT transforms. It implies that orthonormal ma- 
trices whose entries have the same order of absolute magnitude 
generate comparable performance. In addition, highly sparse 
SRM using the global randomizer such as DCT32-G and 



WHT32-G has experimental performance comparable to that 
of the dense SRMs. Note that these SRM are highly sparse 
because their density are only 2^^^. This observation again 
verifies that SRM with the global randomizer outperforms 
SRM with the local randomizer. This might indicate that our 
theoretical analysis for the global randomizer is inadequate. In 
practice, we believe that the global randomizer always works 
as well as and even better than the local randomizer We leave 
the theoretical justification of this observation for our future 
research. 

VI. Discussion and Conclusion 
A. Complexity Discussion 

We compare the computation and memory complexity be- 
tween the proposed SRM and other random sensing matrices 
such as Gaussian or Bernoulli i.i.d. matrices. In implemen- 
tation, the i.i.d Bernoulli matrix is obviously preferred than 
i.i.d Gaussian one as the former has integer entries {1,-1} 
and requires only 1 bit to represent each entry. A M x N 
i.i.d. Bernoulli sensing matrix requires AIN bits for storing 
the matrix and MN additions and multiplications for sensing 
operation. An M x N SRM only requires 2N + N log N bits 
for storage and N + N log N additions and multiplications for 
sensing operation. With the SRM method, the computational 
complexity and memory space required is independent with 
the number of measurements M. Note that with the SRM 
method, we do not need to store matrices D, F, R explicitly. 
We only need to store the diagonals of D and of R and the fast 
transform F, resulting in significant saving of both memory 
space and computational complexity. 

Computational complexity and running time of li- 
minimization based reconstruction algorithms often depend 
critically on whether matrix-vector multiplications Au and 
A^u can be computed quickly and efficiently (where A = 
[13 1 . For the sake of simplicity, assuming that ^ is 
identity matrix. Au ~ requires MN = O {KN log N) 
additions and multiplications for a random sensing matrix $ 
and 0{N log N) additions and multiplications for the SRM 
method. This implies that at each iteration, SRM can speed up 
the reconstruction algorithm with at least K folds. With com- 
pressible signals (e.g., images), the number of measurements 
acquired tends to be proportional with the signal dimension, 
for example, M = N/4. In this case, using SRM can achieve 
computational complexity reduction with the factor of ^j^^y 
times. 

Table |III] summarizes computational complexity and practi- 
cal advantages between SRM and a random sensing matrix. 
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TABLE n 

SRMS EMPLOYED IN THE EXPERIMENT WITH COMPRESSIBLE SIGNALS 



Notation 


R 


F 


DCT32-G 


Global randomizer 


32 X 32 block diagonal DCT 


WHT32-G 


Global randomizer 


32 X 32 block diagonal WHT 


DCT512-L 


Local randomizer 


512 X 512 block diagonal DCT 


WHT512-L 


Local randomizer 


512 X 512 block diagonal WHT 




(c) (d) 
Fig. 4 

Reconstructed 512 x 512 Boat images from M/N = 35% sampling rate, (a) The original Boat image; (b) using the WPFFT ensemble: 
28.5DB; (c) USING THE WHT32-G ensemble: 28dB; (d) USING the WHT512-L ensemble: 27.7dB 



TABLE in 
Practical feature comparison 



Features 


SRMs 


Completely Random Matrices 


No. of measurements for exact recovery 


0{K log N) 


0{K log N) 


Sensing complexity 


NlogN 


0{K NlogN) 


Reconstruction complexity at each iteration 


0{N log N) 


0{KN log N) 


Fast computability 


Yes 


No 


Block-based processing 


Yes 


No 
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B. Relationship with Other Related Works 

When R is the local randomizer, SRM is a little reminiscent 
to the so-called Fast Johnson-Lindenstrauss Transform (FJLT) 
ll27l . However, SRM employs a simpler matrix D. In FJLT, 
this matrix D is a completely random matrix with sparse 
distribution. It is unknown if there exists an efficient imple- 
mentation of such a sparse random matrix. SRM is relevant 
for practical applications because of its high performance and 
fast computation. 

In 1 25 1, 1 26 1, the Scrambled/Permuted FFT is experimen- 
tally proposed as a heuristic low-complexity sensing method 
that is efficient for sensing a large signal. To the best of our 
knowledge, however, there has not been any theoretical analy- 
sis for the Scrambled FFT. SRM is a generalized framework in 
which Scrambled FFT is just a specific case, and thus verifying 
the theoretical validity of the Scrambled FFT. 

Random Convolution convolving the input signal with a ran- 
dom pulse followed by randomly subsampling measurements 
is proposed in 1 19] as a promising sensing method for practical 
applications. Although there are a few other methods that ex- 
ploit the same idea of convolving a signal with a random pulse, 
for examples: Random Filter in IfTTl and Toeplitz structured 
sensing matrix in ifTSl , only the Random Convolution method 
can be shown to approach optimal sensing performance. While 
sensing methods such as Random Filter and Toeplitz-based CS 
methods subsample measurements structurally, the Random 
Convolution method subsamples measurements in a random 
fashion, a technique that is also employed in SRM. In addition, 
the Random Convolution method introduces randomness into 
the Fourier domain by randomizing phases of Fourier coeffi- 
cients. These two techniques decouple stochastic dependence 
among measurements and thus, giving the Random Convolu- 
tion method a higher performance. 

SRM is distinct from all aforementioned methods, including 
the Random Convolution one. A key difference is that SRM 
pre-randomizes a sensing signal directly in its original domain 
(via the global randomizer or the local randomizer) while 
the Random Convolution method pre-randomizes a sensing 
signal in the Fourier domain. SRM also extends the Random 
Convolution method by showing that not only Fourier trans- 
form but also other popular fast transforms, such as DCT or 
WHT, can be employed to achieve similar high performance. 
In conclusion, among existing sensing methods, the SRM 
framework presents an alternative approach to design high 
performance, low-complexity sensing matrices with practical 
and flexible features. 

Appendix I 

Central Limit Theorem. Lef Zi,Z2,-..,Zn be mutually 
independent random variables. Assume E{Zk) — and denote 
— X^A^i V'^^'i^k) ■ If for a given e > and N sufficiently 
large, the following inequalities hold: 

Var{Zk)<e(T^ k^l,2,...,N 

then distribution of the normalized sum S — X^feLi 
converges to J\f{0, cr^) 



Combinatorial Central Limit Theorem. Given two se- 
quences {flfej^x ^'^^ {bk}k=i- Assume the are not all 
equal and bk are also not all equal. Let [cji,cj2, ■ ■ ■ ,i-^n] be a 
uniform random permutation of [1,2, N]. Denote Zk — a^^ 
and 

N 

5* = ^ Zkbk] 

k=l 

S is asymptotically normally distributed Af(E(S), Var{S)) if 

maxi<fe<Af(Zfe - Z)"^ maxi<fc<Ar(&fe - 6)^ 
lim N =-= = =-= = = 0; 

where 

1 ^ - 1 ^ 

6--^6fe and 2 = -^.^^- 

k=l k=l 

Appendix II 

Hoeffding's Concentration Inequality. Suppose 
Xi,X2,...,Xn are independent random variables and 
o-k < Xk i£ bk (k — 1,2, N). Define a new random 
variable S = ^^=1 -^k- Then for any t > 



P{\S - E{S)\>t) <2e s£Li(''*=-"*=)\ 

Ledoux's Concentration Inequality. Let {r?i}i<i<jv be a 

sequence of independent random variables such that \r]i\ < 1 
almost surely and Vi, V2,. • • , f jv be vectors in Banach space. 
Define a new random variable: S = \\ X]t=i Vi'^iW- Then for 
any t > Q, 

P{S>EiS)+t)<2expi~-^) 

where denote the variance of S and = 

suP||„||<iE^i l(w,«»>P- 

Talagrand's Concentration Inequality. Let Z^ be zero-mean 
i.i.d random variables and bounded \Zk\ < A and Uk be 
column vectors of a matrix U. Define a new random variable: 
S —\\ ZkUk\\. Then for any t > 0: 

f Bt 
PiS > EiS) + t) < 3 exp(-- log(l + ^,^^)) 

where c is some constant, variance = E{Z'^)\\U\\'^ and 
B = Amaxi<fe<jv ||ufc||- 

Sourav's Concentration Inequality. Let {^y }i<ij<Ar be a 

collection of numbers from [0,1]. Let [wi, CJ2, • ■ ■ , w^y] be a 
uniformly random permutation 0/ [1, 2, . . . , A'^]. Define a new 
random variable: S — X^il^i -^i^i- Then for any t>Q 

Pi\S-EiS)\>t)<2.M-^^^). 
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Fig. 2 

Performance CURVES: probability of exact recovery vs. 

PARSITYii'. (A) WHEN* IS IDCT BASIS. (B) WHEN* IS DAUBECHIES-8 
WAVLET BASIS. (C) WHEN * IS THE IDENTITY BASIS 
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Fig. 3 

Performance CURVES: Quality of signal reconstruction vs. 

SAMPLING RATE M/7V. (A) THE 512 X 512 LENA IMAGE. (B) THE 

512 X 512 Barbara image, (c) the 512 x 512 Boat image 



